ktlab | knowledgebase

原文翻译#

这个部分是对 JUC 包中的 AbstractQueueSynchroizer 类说明的翻译。

AQS 提供一个框架，可以用于实现依赖于先进先出（FIFO）等待队列的阻塞锁和相关同步器（信号量、事件等）。此类提供了一个同步器脚手架，对于大多数依赖于单个原子 int 值来表示状态的同步器，提供一些通用的方法实现，从而在此基础上可以实现定制逻辑。
子类必须定义更改此状态的受保护方法，并对该状态用于获取或释放锁时表示的意义作出适当的解释。基于这些，此类中的其他方法负责所有的排队和阻塞机制。
子类可以维护其他状态字段，但只有使用方法 getState、setState 和 compareAndSetState 原子更新的 int 值在同步方面受到跟踪。

Provides a framework for implementing blocking locks and related synchronizers (semaphores, events, etc) that rely on first-in-first-out (FIFO) wait queues. This class is designed to be a useful basis for most kinds of synchronizers that rely on a single atomic int value to represent state. Subclasses must define the protected methods that change this state, and which define what that state means in terms of this object being acquired or released. Given these, the other methods in this class carry out all queuing and blocking mechanics. Subclasses can maintain other state fields, but only the atomically updated int value manipulated using methods getState, setState and compareAndSetState is tracked with respect to synchronization.

子类应定义为非公共的内部辅助类 ( non-public internal helper)，用于实现其外围类的同步特性。类 AbstractQueuedSynchronizer 并不实现任何同步接口。相反，它定义了一些辅助方法（如 acquireInterruptibly），具体锁和相关同步器的实现可以适当调用这些方法来实现同步。

Subclasses should be defined as non-public internal helper classes that are used to implement the synchronization properties of their enclosing class. Class AbstractQueuedSynchronizer does not implement any synchronization interface. Instead it defines methods such as acquireInterruptibly that can be invoked as appropriate by concrete locks and related synchronizers to implement their public methods.

此类支持默认独占或共享模式（也可以两者都支持）。
当以独占模式获取时，其他线程的获取尝试无法成功。
当以共享模式获取时，其他线程获取尝试可能会（不一定）成功。
这个类不 “理解” 这些差异，只是机械的执行：
在共享模式获取成功时，下一个等待线程（如果存在）也必须确定是否它也可以获取。
在不同模式下的等待线程共享相同的 FIFO 队列。
一般而言，子类实现仅支持其中一种模式，但也自行可同时支持两种模式（例如 ReadWriteLock）。仅支持独占模式或共享模式的子类无需重新实现另一种模式的方法。

This class supports either or both a default exclusive mode and a shared mode. When acquired in exclusive mode, attempted acquires by other threads cannot succeed. Shared mode acquires by multiple threads may (but need not) succeed. This class does not "understand" these differences except in the mechanical sense that when a shared mode acquire succeeds, the next waiting thread (if one exists) must also determine whether it can acquire as well. Threads waiting in the different modes share the same FIFO queue. Usually, implementation subclasses support only one of these modes, but both can come into play for example in a ReadWriteLock. Subclasses that support only exclusive or only shared modes need not define the methods supporting the unused mode.

该类定义了一个内部嵌套类 AbstractQueuedSynchronizer.ConditionObject，支持独占模式的子类可将其用作 Condition 的实现，对于这些子类， isHeldExclusively 方法报告当前线程是否独占的持有同步状态，方法 release 完全释放此对象（其中参数的作用是与当前的getState值一起，经过计算进行状态更新，状态值表征的含义由实现者决定），方法 acquire的参数给出要保存的状态值（这里说的给出要保存的状态值不是直接更新状态，而是根据getState的状态计算后行状态更新），最终将此对象（Synchronizer）恢复到acuqire之前的状态。没有其他 AbstractQueuedSynchronizer 方法可以创建此类条件，因此如果无法满足此约束，不要使用它。当然，AbstractQueuedSynchronizer.ConditionObject 的具体行为取决于同步器实现赋予给它的语义。

Note

这一段表意很不清晰，尽管已知在可重入锁中，release 和acquire的arg参数分别表示要释放和获取的锁的数量，但是与这里对release和acquire的解释很不契合

method release invoked with the current getState value fully releases this object, and acquire, given this saved state value, eventually restores this object to its previous acquired state.

This class defines a nested AbstractQueuedSynchronizer.ConditionObject class that can be used as a Condition implementation by subclasses supporting exclusive mode for which method isHeldExclusively reports whether synchronization is exclusively held with respect to the current thread, method release invoked with the current getState value fully releases this object, and acquire, given this saved state value, eventually restores this object to its previous acquired state. No AbstractQueuedSynchronizer method otherwise creates such a condition, so if this constraint cannot be met, do not use it. The behavior of AbstractQueuedSynchronizer.ConditionObject depends of course on the semantics of its synchronizer implementation.

该类（AQS）提供对内部队列的检查、检测和监视方法，条件对象类也提供了类似方法。这些方可可以按需在实现类中使用，从而通过AbstractQueuedSynchronizer实现同步机制。

This class provides inspection, instrumentation, and monitoring methods for the internal queue, as well as similar methods for condition objects. These can be exported as desired into classes using an AbstractQueuedSynchronizer for their synchronization mechanics.

该类（AQS）在序列化时仅存储维护状态的底层原子整数，因此反序列化后的对象具有空线程队列。通常需要序列化的子类将定义一个 readObject 方法，在反序列化时将其还原到已知的初始状态。

Serialization of this class stores only the underlying atomic integer maintaining state, so deserialized objects have empty thread queues. Typical subclasses requiring serializability will define a readObject method that restores this to a known initial state upon deserialization.

源码剖析#

经过这一段不清不楚的介绍，似乎仍然还是不太能理解。AQS 做了什么事情。
不过通过查看源码则不难发现，AQS 定义了下面这些东西：

对节点 (Node) 的定义
节点的头尾指针，也就是一个双向链表。作为同步队列 (Sync Queue)
一个意义不明的 state 的值
一些对同步队列进行基本安全操作的方法。
留给子类实现的release 和 acquire 方法。
一个完整的，不可变的ConditionObject内部类
1. Condition Object实现了Condition 也就是条件变量
2. 其内部则由一条由Node构成的链表，也就是条件变量的等待队列
3. 实现了await和signal的行为，这二者会操作绑定到的 AQS 对象的SyncQueue。
  下面一一剖析这些定义的内容及其作用。

Node 及双向链表（同步队列）#

Node 部分定义了：

SHARED和EXCLUSIVE两个常量，表明节点的采用的模式
四种等待状态：CANCELLED，SIGNAL，CONDITION，PROPAGATE，由waitStatus表示。
thread指明节点代表的线程。
next和 prev 字段，用于指明链表的前驱和后继节点。
nextWaiter下一个等待者，暂时意义不明，我们也不是很关心。

loading...
这就是 Node 的基本结构。
但是双链表的重点不在于本身的数据结构，而在于如何操作这个双链表。

通过源码去查看链表head 和 tail 的调用链可以发现，head 操作（出队）只出现在 acquire 相关方法中，tail操作（入队）出现在 acquire 以及ConditionObject的await相关方法中。
此外再无别的方法可以进行出入队的操作。

源码注释如下：

Wait queue node class.
The wait queue is a variant of a "CLH" (Craig, Landin, and Hagersten) lock queue. CLH locks are normally used for spinlocks. We instead use them for blocking synchronizers, but use the same basic tactic of holding some of the control information about a thread in the predecessor of its node. A "status" field in each node keeps track of whether a thread should block. A node is signalled when its predecessor releases. Each node of the queue otherwise serves as a specific-notification-style monitor holding a single waiting thread. The status field does NOT control whether threads are granted locks etc though. A thread may try to acquire if it is first in the queue. But being first does not guarantee success; it only gives the right to contend. So the currently released contender thread may need to rewait.
To enqueue into a CLH lock, you atomically splice it in as new tail. To dequeue, you just set the head field.
+------+ prev +-----+ +-----+
head | | <---- | | <---- | | tail
+------+ +-----+ +-----+

Insertion into a CLH queue requires only a single atomic operation on "tail", so there is a simple atomic point of demarcation from unqueued to queued. Similarly, dequeuing involves only updating the "head". However, it takes a bit more work for nodes to determine who their successors are, in part to deal with possible cancellation due to timeouts and interrupts.
The "prev" links (not used in original CLH locks), are mainly needed to handle cancellation. If a node is cancelled, its successor is (normally) relinked to a non-cancelled predecessor. For explanation of similar mechanics in the case of spin locks, see the papers by Scott and Scherer at http:// www. cs. rochester. edu/ u/ scott/ synchronization/
We also use "next" links to implement blocking mechanics. The thread id for each node is kept in its own node, so a predecessor signals the next node to wake up by traversing next link to determine which thread it is. Determination of successor must avoid races with newly queued nodes to set the "next" fields of their predecessors. This is solved when necessary by checking backwards from the atomically updated "tail" when a node's successor appears to be null. (Or, said differently, the next-links are an optimization so that we don't usually need a backward scan.)
Cancellation introduces some conservatism to the basic algorithms. Since we must poll for cancellation of other nodes, we can miss noticing whether a cancelled node is ahead or behind us. This is dealt with by always unparking successors upon cancellation, allowing them to stabilize on a new predecessor, unless we can identify an uncancelled predecessor who will carry this responsibility.
CLH queues need a dummy header node to get started. But we don't create them on construction, because it would be wasted effort if there is never contention. Instead, the node is constructed and head and tail pointers are set upon first contention.
Threads waiting on Conditions use the same nodes, but use an additional link. Conditions only need to link nodes in simple (non-concurrent) linked queues because they are only accessed when exclusively held. Upon await, a node is inserted into a condition queue. Upon signal, the node is transferred to the main queue. A special value of status field is used to mark which queue a node is on.
Thanks go to Dave Dice, Mark Moir, Victor Luchangco, Bill Scherer and Michael Scott, along with members of JSR-166 expert group, for helpful ideas, discussions, and critiques on the design of this class.
等待队列节点类。
等待队列是 “CLH”（Craig、Landin 和 Hagersten）锁队列的变体。CLH 锁通常用于旋转锁。相反，我们使用它们来阻塞同步器，但使用相同的基本策略，即在其节点的前身中保存有关线程的一些控制信息。每个节点中的 “状态” 字段跟踪线程是否应阻塞。当节点的前身释放时，会发出信号。否则，队列的每个节点都充当一个特定的通知样式监视器，其中包含一个等待线程。但是，状态字段不控制是否授予线程锁定等。如果线程在队列中排在第一位，则线程可能会尝试获取。但成为第一并不能保证成功；它只赋予了争辩的权利。因此，当前发布的竞争者线程可能需要重新等待。
要排队到 CLH 锁中，您需要将其原子拼接为新的尾部。要取消排队，您只需设置 head 字段。
+------+ prev +-----+ +-----+
head | | <---- | | <---- | | tail
+------+ +-----+ +-----+

插入 CLH 队列只需要在 “tail” 上进行一次原子操作，因此有一个从未排队到排队的简单原子分界点。同样，取消排队仅涉及更新 “头部”。但是，节点需要更多的工作来确定它们的继任者是谁，部分原因是为了处理由于超时和中断而可能取消的问题。
“prev” 链接（未在原始 CLH 锁中使用）主要用于处理取消。如果一个节点被取消，它的继任者（通常）会重新链接到一个未取消的前置节点。有关旋转锁情况下类似机制的解释，请参阅 Scott 和 Scherer 的论文，http:// www. cs. rochester. edu/ u/ scott/ synchronization/
我们还使用 “下一个” 链接来实现阻塞机制。每个节点的线程 ID 都保存在其自己的节点中，因此前置节点通过遍历下一个链接来确定它是哪个线程，从而向下一个节点发出唤醒信号。确定继任者必须避免使用新排队的节点来设置其前任的 “下一个” 字段。必要时，当节点的继任者显示为 null 时，通过从原子更新的 “尾部” 向后检查来解决此问题。（或者，换句话说，下一个链接是一种优化，因此我们通常不需要向后扫描。
取消为基本算法引入了一些保守性。由于我们必须轮询取消其他节点，因此我们可能会错过注意到取消的节点是在前面还是在后面。这是通过在取消时始终取消继任者来处理的，允许他们在新的前任上稳定下来，除非我们能确定一个未取消的前任来承担这一责任。
CLH 队列需要一个虚拟标头节点才能启动。但是我们不会在施工中创建它们，因为如果从不争用，那将是浪费精力。相反，在第一次争用时构造节点并设置正面和反面指针。
等待条件的线程使用相同的节点，但使用其他链接。条件只需要链接简单（非并发）链接队列中的节点，因为只有在独占持有时才能访问这些节点。等待时，节点将插入到条件队列中。发出信号后，节点被转移到主队列。状态字段的特殊值用于标记节点所在的队列。
感谢 Dave Dice、Mark Moir、Victor Luchangco、Bill Scherer 和 Michael Scott 以及 JSR-166 专家组的成员，他们对本课程的设计提出了有益的想法、讨论和批评。

state 值#

在 AQS 中，并不直接使用state，但是暴露给了子类，这个的含义就比较清晰了，AQS 不对 state做解释，拿 state 做什么，取决于子类，说明中也是如此，子类也可以维护其他状态字段，但只有使用方法 getState、setState 和 compareAndSetState 原子更新的 int 值在同步方面受到跟踪。这只是一种 optional 的，表示状态的字段。

一些对同步队列进行基本安全操作的方法#

enq(Node node)，入队一个节点
setHead(Node node)，setHeadAndPropagate(Node node)，addWaiter 等队列操作方法。

如果线程在争用时，获取到许可，进入临界区，那么当前线程的节点就会从 AQS 队列中消失。尽管在很多文章给出的图中，正在临界区中执行给出的线程在同步队列的头部，但查看源码不难发现，正在临界区执行的线程不需要在队列中的任何一个位置。

java:highlight={3-4,13-18}

private void setHead(Node node) {  
    head = node;  
    node.thread = null;  // 令头节点不指向任何线程
    node.prev = null;  
}

final boolean acquireQueued(final Node node, int arg) {  
    boolean failed = true;  
    try {  
        boolean interrupted = false;  
        for (;;) {  
            final Node p = node.predecessor();  
            if (p == head && tryAcquire(arg)) {  
                setHead(node); // 让成功获取的线程对应节点成为头节点 
                p.next = null; // help GC  
                failed = false;  
                return interrupted;  
            }  
            if (shouldParkAfterFailedAcquire(p, node) &&  
                parkAndCheckInterrupt())  
                interrupted = true;  
        }  
    } finally {  
        if (failed)  
            cancelAcquire(node);  
    }
}

这隐含了一个假设，当临界区线程退出临界区的时候，一定会调用具有 unlock 语义的方法，一方面这通知了 AQS 可以让同步队列中的线程开始新一轮争用，另一方面，将调用 unlock 的职责交给上层使用者，由上层使用者指明临界区的范围，更为灵活。

而当他在临界区中调用 await 时，就会将自己加入条件队列。

队列的操作#

任何调用 acquire 的线程，如果此次获取失败，会将自己挂入到 Sync Queue 上。
而获取到临界区的访问的线程。此时一定不在 SyncQueue 上。

怎么进条件队列？

第一，对于一个处于争用区的线程，他不在 aqs 中，但是具有调用 aqs 相关方法的能力。比如 condtion.await。

这时候会将自己以 condition 状态加入条件队列，同时尝试 release 同步队列。
如果使用正常，release (释放争用区) 操作就不应该失败，那么同步队列中正在获取的线程（如果有）就会争用成功，一切正常。

但是如果使用不正常，release (释放争用区) 操作就可能失败 (可能他就没有拿到过争用区)，线程会抛出异常，那么就会在条件队列中从 condition 状态变成 canceled 状态，成为条件队列中的垃圾。
那么在后面其他线程节点加入到条件队列时就会顺便被清理（addConditionWaiter）。

什么时候出条件队列？
由条件变量的定义可知，当收到来自其他线程的 signal 操作时，条件队列中的节点就有可能被唤醒从条件队列中出去。

其他线程 A 调用 signal 操作，线程 A 可能继续执行，但这不重要，重要的是会让条件队列的第一个 condition 状态的节点 (线程 B)，从条件队列中释放，并且进入同步队列，并且将状态从 condition 变为默认状态 0。

上面简单的表述了，一种进同步队列的步骤，但是进同步队列需要干什么？

线程 B 一旦发现自己进入同步队列，就开始尝试争用临界区。如果争用成功，则走同步队列的出队流程。

但如果争用不成功 (可能自己处在链表后部分，所以争不到)，为了节省资源，暂时需要休眠 / 阻塞，所以先将前面节点的状态设为signal，意为在前面节点线程争用完成准备离开的时候，需要唤醒后面的节点。

将前面的节点设为 signal 之后就可以正常休眠了 (park)

但是，如果前面节点是 canceled，那么这次暂时不休眠，并且将前置指针进行调整，调整到正常的前驱上，继续获取，当前面的节点变为 signal 时，就可以正常休眠了。

而在自己 (node) 取消时，就一直追溯到未取消的前驱节点 A，将 A.next 设置为 node.next。

也就是在同步队列上，节点在取消时将自己从 next 链上剥离下来，节点在 park (获取，但失败了，还有下一次) 前，将前面的所有取消节点从 pre 链上剥离下来。

从而踢出了取消节点。

什么时候出同步队列？

当节点位于同步队列中时，一定是争用失败了，也就是一定有一个或多个争用成功的线程（不在同步队列中），当争用成功的线程离开临界区时，一定会进行 release 操作。一方面，修改 state（可能），表明可用资源增加，另外一方面，通知（unpark）同步队列上的节点，继续争用。

子类需要在 tryRelease 和 tryAcquire 中修改 state，指示争用情况。

Node.PROPAGATE waitStatus 值只能在头节点中设置。仅指示头节点的后继节点将传播 unpark 后继节点的行为。因为当前节点成功获取共享锁定时，后继节点也可能可以成功获取共享锁定。

SIGNAL 表示后继节点已经挂起，需要唤醒
而 0 表示仍在自旋，在可用时可以直接获取到，不需要唤醒。